skip to main content


Search for: All records

Creators/Authors contains: "Juan, David"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract The common vampire bat ( Desmodus rotundus ) is a sanguivorous (i.e., blood-eating) bat species distributed in the Americas from northern Mexico southwards to central Chile and Argentina. Desmodus rotundus is one of only three mammal species known to feed exclusively on blood, mainly from domestic mammals, although large wildlife and occasionally humans can also serve as a food source. Blood feeding makes D. rotundus an effective transmissor of pathogens to its prey. Consequently, this species is a common target of culling efforts by various individuals and organizations. Nevertheless, little is known about the historical distribution of D. rotundus . Detailed occurrence data are critical for the accurate assessment of past and current distributions of D. rotundus as part of ecological, biogeographical, and epidemiological research. This article presents a dataset of D. rotundus historical occurrence reports, including >39,000 locality reports across the Americas to facilitate the development of spatiotemporal studies of the species. Data are available at 10.6084/m9.figshare.15025296 . 
    more » « less
  2. Abstract

    Somatic mosaicism is defined as an occurrence of two or more populations of cells having genomic sequences differing at given loci in an individual who is derived from a single zygote. It is a characteristic of multicellular organisms that plays a crucial role in normal development and disease. To study the nature and extent of somatic mosaicism in autism spectrum disorder, bipolar disorder, focal cortical dysplasia, schizophrenia, and Tourette syndrome, a multi-institutional consortium called the Brain Somatic Mosaicism Network (BSMN) was formed through the National Institute of Mental Health (NIMH). In addition to genomic data of affected and neurotypical brains, the BSMN also developed and validated a best practices somatic single nucleotide variant calling workflow through the analysis of reference brain tissue. These resources, which include >400 terabytes of data from 1087 subjects, are now available to the research community via the NIMH Data Archive (NDA) and are described here.

     
    more » « less
  3. The rich diversity of morphology and behavior displayed across primate species provides an informative context in which to study the impact of genomic diversity on fundamental biological processes. Analysis of that diversity provides insight into long-standing questions in evolutionary and conservation biology and is urgent given severe threats these species are facing. Here, we present high-coverage whole-genome data from 233 primate species representing 86% of genera and all 16 families. This dataset was used, together with fossil calibration, to create a nuclear DNA phylogeny and to reassess evolutionary divergence times among primate clades. We found within-species genetic diversity across families and geographic regions to be associated with climate and sociality, but not with extinction risk. Furthermore, mutation rates differ across species, potentially influenced by effective population sizes. Lastly, we identified extensive recurrence of missense mutations previously thought to be human specific. This study will open a wide range of research avenues for future primate genomic research. 
    more » « less
    Free, publicly-accessible full text available June 2, 2024
  4. null (Ed.)
  5. Abstract

    Amazonian waters are classified into three biogeochemical categories by dissolved nutrient content, sediment type, transparency, and acidity—all important predictors of autochthonous and allochthonous primary production (PP): (1) nutrient-poor, low-sediment, high-transparency, humic-stained, acidicblackwaters; (2) nutrient-poor, low-sediment, high-transparency, neutralclearwaters; (3) nutrient-rich, low-transparency, alluvial sediment-laden, neutralwhitewaters. The classification, first proposed by Alfred Russel Wallace in 1853, is well supported but its effects on fish are poorly understood. To investigate how Amazonian fish community composition and species richness are influenced by water type, we conducted quantitative year-round sampling of floodplain lake and river-margin habitats at a locality where all three water types co-occur. We sampled 22,398 fish from 310 species. Community composition was influenced more by water type than habitat. Whitewater communities were distinct from those of blackwaters and clearwaters, with community structure correlated strongly to conductivity and turbidity. Mean per-sampling event species richness and biomass were significantly higher in nutrient-rich whitewater floodplain lakes than in oligotrophic blackwater and clearwater river-floodplain systems and light-limited whitewater rivers. Our study provides novel insights into the influences of biogeochemical water type and ecosystem productivity on Earth’s most diverse aquatic vertebrate fauna and highlights the importance of including multiple water types in conservation planning.

     
    more » « less
  6. True, John (Ed.)
    Abstract Novel phenotypes are commonly associated with gene duplications and neofunctionalization, less documented are the cases of phenotypic maintenance through the recruitment of novel genes. Proteolysis is the primary toxic character of many snake venoms, and ADAM metalloproteinases, named snake venom metalloproteinases (SVMPs), are largely recognized as the major effectors of this phenotype. However, by investigating original transcriptomes from 58 species of advanced snakes (Caenophidia) across their phylogeny, we discovered that a different enzyme, matrix metalloproteinase (MMP), is actually the dominant venom component in three tribes (Tachymenini, Xenodontini, and Conophiini) of rear-fanged snakes (Dipsadidae). Proteomic and functional analyses of these venoms further indicate that MMPs are likely playing an “SVMP-like” function in the proteolytic phenotype. A detailed look into the venom-specific sequences revealed a new highly expressed MMP subtype, named snake venom MMP (svMMP), which originated independently on at least three occasions from an endogenous MMP-9. We further show that by losing ancillary noncatalytic domains present in its ancestors, svMMPs followed an evolutionary path toward a simplified structure during their expansion in the genomes, thus paralleling what has been proposed for the evolution of their Viperidae counterparts, the SVMPs. Moreover, we inferred an inverse relationship between the expression of svMMPs and SVMPs along the evolutionary history of Xenodontinae, pointing out that one type of enzyme may be substituting for the other, whereas the general (metallo)proteolytic phenotype is maintained. These results provide rare evidence on how relevant phenotypic traits can be optimized via natural selection on nonhomologous genes, yielding alternate biochemical components. 
    more » « less
  7. INTRODUCTION Resolving the role that different environmental forces may have played in the apparent explosive diversification of modern placental mammals is crucial to understanding the evolutionary context of their living and extinct morphological and genomic diversity. RATIONALE Limited access to whole-genome sequence alignments that sample living mammalian biodiversity has hampered phylogenomic inference, which until now has been limited to relatively small, highly constrained sequence matrices often representing <2% of a typical mammalian genome. To eliminate this sampling bias, we used an alignment of 241 whole genomes to comprehensively identify and rigorously analyze noncoding, neutrally evolving sequence variation in coalescent and concatenation-based phylogenetic frameworks. These analyses were followed by validation with multiple classes of phylogenetically informative structural variation. This approach enabled the generation of a robust time tree for placental mammals that evaluated age variation across hundreds of genomic loci that are not restricted by protein coding annotations. RESULTS Coalescent and concatenation phylogenies inferred from multiple treatments of the data were highly congruent, including support for higher-level taxonomic groupings that unite primates+colugos with treeshrews (Euarchonta), bats+cetartiodactyls+perissodactyls+carnivorans+pangolins (Scrotifera), all scrotiferans excluding bats (Fereuungulata), and carnivorans+pangolins with perissodactyls (Zooamata). However, because these approaches infer a single best tree, they mask signatures of phylogenetic conflict that result from incomplete lineage sorting and historical hybridization. Accordingly, we also inferred phylogenies from thousands of noncoding loci distributed across chromosomes with historically contrasting recombination rates. Throughout the radiation of modern orders (such as rodents, primates, bats, and carnivores), we observed notable differences between locus trees inferred from the autosomes and the X chromosome, a pattern typical of speciation with gene flow. We show that in many cases, previously controversial phylogenetic relationships can be reconciled by examining the distribution of conflicting phylogenetic signals along chromosomes with variable historical recombination rates. Lineage divergence time estimates were notably uniform across genomic loci and robust to extensive sensitivity analyses in which the underlying data, fossil constraints, and clock models were varied. The earliest branching events in the placental phylogeny coincide with the breakup of continental landmasses and rising sea levels in the Late Cretaceous. This signature of allopatric speciation is congruent with the low genomic conflict inferred for most superordinal relationships. By contrast, we observed a second pulse of diversification immediately after the Cretaceous-Paleogene (K-Pg) extinction event superimposed on an episode of rapid land emergence. Greater geographic continuity coupled with tumultuous climatic changes and increased ecological landscape at this time provided enhanced opportunities for mammalian diversification, as depicted in the fossil record. These observations dovetail with increased phylogenetic conflict observed within clades that diversified in the Cenozoic. CONCLUSION Our genome-wide analysis of multiple classes of sequence variation provides the most comprehensive assessment of placental mammal phylogeny, resolves controversial relationships, and clarifies the timing of mammalian diversification. We propose that the combination of Cretaceous continental fragmentation and lineage isolation, followed by the direct and indirect effects of the K-Pg extinction at a time of rapid land emergence, synergistically contributed to the accelerated diversification rate of placental mammals during the early Cenozoic. The timing of placental mammal evolution. Superordinal mammalian diversification took place in the Cretaceous during periods of continental fragmentation and sea level rise with little phylogenomic discordance (pie charts: left, autosomes; right, X chromosome), which is consistent with allopatric speciation. By contrast, the Paleogene hosted intraordinal diversification in the aftermath of the K-Pg mass extinction event, when clades exhibited higher phylogenomic discordance consistent with speciation with gene flow and incomplete lineage sorting. 
    more » « less
    Free, publicly-accessible full text available April 28, 2024
  8. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
    Free, publicly-accessible full text available April 28, 2024